Decision tree computation in hardware utilizing a physically distinct integrated circuit with on-chip memory and a reordering of data to be grouped

ABSTRACT

A computing device for use in decision tree computation is provided. The computing device may include a software program executed by a processor using portions of memory of the computing device, the software program being configured to receive user input from a user input device associated with the computing device, and in response, to perform a decision tree task. The computing device may further include a decision tree computation device implemented in hardware as a logic circuit distinct from the processor, and which is linked to the processor by a communications interface. The decision tree computation device may be configured to receive an instruction to perform a decision tree computation associated with the decision tree task from the software program, process the instruction, and return a result to the software program via the communication interface.

BACKGROUND

Decisions trees are used in a wide variety of software applications. Forexample, image processing programs, speech recognition programs, andsearch engines use decision trees to make probabilistic determinationsof a class or characteristic of a data element. One type of decisiontree, referred to as a binary decision tree, is formed in a branchingtree structure, with an origin node branching to two child nodes, withsome of the child nodes optionally functioning as parent nodes andbranching in turn to two more child nodes, etc. Terminating nodeswithout child nodes are referred to as leaf nodes, while nodes thatbranch to child nodes are referred to as branch nodes.

The decision tree is traversed from top to bottom. Each branch nodeincludes a classifier function, also referred to as a node descriptor,according to which the input data under analysis is evaluated todetermine whether to branch right or left when proceeding down the treestructure. Thus, beginning at the origin, and proceeding until a leafnode is reached, at each branch node, the input data is evaluated by theclassifier for the branch node, and traverses to the appropriate childnode. Each leaf node represents a data class or characteristic, and alsotypically has assigned to it a probability of certainty that the inputdata is of the same class or characteristic as the leaf node. Bytraversing the decision tree from the origin node to a leaf node, aprobabilistic determination of a class or characteristic of the inputdata can be made.

To date, evaluation of decision trees has been performed in software,which offers rapid prototyping and development times, flexible updating,etc. However, decision tree algorithms in software have been pushed totheir limit by recent applications, such as in the computer gaming arts.For example, recently body part recognition software using decision treeanalysis has been developed which can, in real time, evaluate incomingdepth images from a depth camera input of a game console, to identifybody parts of a moving image of a user. The output of the decision treeis used to perform skeletal modeling of the user, such that the user caninteract with the game console in a three dimensional interaction spacewithout holding a handheld controller.

However, such real-time, high throughput software-based decision treealgorithms consume significant power, and thus generate significantheat. They also are operating near their limit in terms of keeping upwith the incoming images, and thus face difficulties if challenged bydevelopers with a higher pixel count per image or an increasinglycomplex decision tree. Further, access to memory not in cache can createbandwidth bottlenecks and access-wait-time slowdowns in these softwarebased algorithms. Finally, processor time devoted to classifyingdecision trees cannot be devoted to other internal processes of the gamecontroller. As a result, increasing the speed while reducing the powerconsumption, heat, and cost of such real time software based decisiontree evaluation of large data sets, remains a significant challenge.

SUMMARY

A computing device for use in decision tree computation is provided. Thecomputing device may include a processor in communication with memoryand mass storage via a communications interface. The computing devicemay further include a software program stored in the mass storage deviceand executed by the processor using portions of the memory, the softwareprogram being configured to receive user input from a user input deviceassociated with the computing device, and in response to perform adecision tree task. The computing device may further include a decisiontree computation device implemented in hardware as a logic circuitdistinct from the processor. The decision tree computation device islinked to the processor by the communications interface, and isconfigured to receive an instruction to perform a decision treecomputation associated with the decision tree task from the softwareprogram, process the instruction, and return a result to the softwareprogram via the communication interface.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter. Furthermore,the claimed subject matter is not limited to implementations that solveany or all disadvantages noted in any part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically shows an embodiment of a system configured totraverse decision trees in hardware.

FIG. 2A shows an example of a decision tree in accordance withembodiments of the present disclosure.

FIG. 2B shows a FIFO buffer during traversal of the decision tree ofFIG. 2A.

FIG. 3A shows a process flow of an embodiment of a method for use indecision tree computation according to one embodiment of the presentdisclosure.

FIG. 3B shows a process flow of an embodiment of a method for traversingdecision trees in a FIFO manner, in accordance with an embodiment of thepresent disclosure.

FIG. 4A shows an example process flow for traversing decision treesutilizing binning in accordance with an embodiment of the presentdisclosure.

FIG. 4B illustrates an example process flow for traversing decisiontrees and reordering the input data to be grouped by node descriptor, ateach level of the decision tree, in accordance with an embodiment of thepresent disclosure.

FIG. 5 shows an embodiment of system including a game console and depthcamera, the system being configured to perform processing of depthimages from the depth camera at least partially using decision treecomputation in hardware.

FIG. 6 shows an example of a processing pipeline of the system of FIG.6, from captured image to graphical output, including a portion of theprocessing pipeline implemented in hardware.

DETAILED DESCRIPTION

In order to address the challenges identified above, a system utilizingdecision trees is described herein that at least partially traverses thedecision trees using dedicated hardware. The system may be configured togenerate increased efficiency over pure software decision tree traversalalgorithms executed on a general purpose processor. For example, thededicated hardware for traversing the decision trees may run at a slowerclock speed than a comparable software-based implementation on a generalpurpose processor, potentially reducing power consumption, heatgeneration, and/or production costs, as well as increasing performanceper unit of power consumption. Alternatively, the clock speed might besimilar, but the greater efficiency of the dedicated hardware cancomplete the task in a much shorter amount of time. This again leads tolower power consumption, and additionally provides software with alarger execution window e.g. more time in between input images duringwhich to perform other tasks.

FIG. 1 schematically shows an embodiment of a decision tree computationsystem 100, which includes a computing device 102 for use in decisiontree computation. The computing device 102 may include a processor 105in communication with memory 106 and mass storage 107 via acommunications interface 110. The computing device 102 may furtherinclude a software program 109 stored in the mass storage 107 andexecuted by the processor 105 using portions of the memory 106. Thesoftware program 109 may be configured to receive user input from a userinput device 124 associated with the computing device 102, and toperform a decision tree task 121 in response.

The user input may be in various forms, including image information froma camera 128, which may be directly streamed from the camera, ortemporarily stored in an on-board memory 126 of the user input device124 and fetched by or otherwise transmitted to the software program 109.In FIG. 1, the user input is shown being stored as input data 127 inmemory 106, and then transmitted to software program 109. As describedbelow in reference to FIGS. 5 and 6, in one embodiment, the computingdevice may be configured as a game console and the user input device maybe a depth camera configured to receive depth images, which are in turnprocessed by decision tree processing implemented at least partially bydecision tree computation device 104. Thus, the input data 127 mayinclude images, such as depth images, comprised of a plurality ofpixels. Each pixel has an X, Y coordinate position within the image towhich it belongs. The pixels of depth images contain depth informationindicating a camera to object depth as measured for the pixel, asdescribed below.

The computing device 102 may further include a decision tree computationdevice 104 implemented in hardware as a logic circuit distinct from theprocessor 105, which is linked to the processor 105 by thecommunications interface 110. The decision tree computation device 104is configured to receive an instruction to perform a decision treecomputation associated with the decision tree task 121 from the softwareprogram 109, process the instruction, and return a result 129 to thesoftware program via the communication interface. The decision treecomputation takes as input the input data, such as these depth imagesfrom the depth camera, which are loaded from memory 106 into the FIFObuffer 112 of the decision tree computation device for processing, asdescribed below. The decision tree computation also takes as input thedecision tree database data 111, which contains the decision treeitself, including node descriptors for each node in the tree. Thedecision tree data may be loaded into the prefetch cache 133 forprocessing at the decision tree computation device 104.

The hardware in which the decision tree computation device 104 isimplemented may be an integrated circuit such as a programmable logicdevice (PLD) or application specific integrated circuit (ASIC). Theintegrated circuit is logically separated from the processor 105 andincludes on-chip memory formed separate from the memory 106 of thecomputing device 102. A field programmable gate array (FPGA) and complexprogrammable logic device (CPLD) are two examples of suitable PLDs thatmay be used to implement decision tree computation device 104. Further,in another embodiment illustrated by a dash-dot line in FIG. 1, thedecision tree computation device 104 may be implemented as asystem-on-chip (“SoC”). In a SoC implementation, typically the processor105, memory 106, and decision tree computation device 104, are formed asseparate logic units within a single SoC integrated circuit, and thecommunication interface 110 includes an on-chip communications interfacesubsystem to enable communication between these separate logic units.

In some embodiments, the on-chip memory further includes a FIFO buffer112, and the decision tree computation device 102 is further configuredto, in response to receiving the instruction to perform the decisiontree computation, load the FIFO buffer 112 with decision tree data fromthe decision tree database 111 in memory 106 (e.g. DDR memory) of thecomputing device 102, for subsequent access and processing by on-chipcomputation logic 118. To accomplish this, FIFO buffer 112 includesassociated FIFO load logic 122 configured to load data into FIFO buffer112 in a first in, first out manner. As shown, FIFO buffer 112 comprises“head”, “hole”, and “tail” identifiers, which will be described ingreater detail below. Other logic included within decision treecomputation device 104 includes but is not limited to, prefetch logic116, and/or sort logic 120, which function as described below.

Input device 124 may comprise, for example, memory 126 and/or camera128. In other embodiments, input device 124 may comprise other on-boardsensors, cameras, and/or processors. As illustrated, input device 124may be configured to provide data (e.g., depth images) directly todecision tree computation device 104 via FIFO load logic 122, or thedata may be first transmitted to and stored in memory 106 of computingdevice 102, for later retrieval by the FIFO load logic, according to theinstructions of software program 109.

Communications interface 110 refers generally to one or morecommunications subsystems provided to enable communications among thevarious host machine components, and between the host machine componentsand the decision tree support components. Thus, it will be understoodthat communication interface 110 may comprise one or more discrete I/Opaths, each potentially utilizing separate protocols, encodings, and/orphysical interfaces. For example, computing device 102 may communicatewith input device 124 via USB and with decision tree computation device104 via Ethernet or a high speed data bus. Furthermore, computing device102 may be configured to utilize one or more host machine interfaces,which may be a type of application programming interface (API) thatenables application programs such as software program 109 to accesshardware resources such as device 104 using a standardized instructionset and protocol. In the illustrated embodiment one example host machineinterface is illustrated as the Simple Interface for ReconfigurableComputing (“SIRC”). Through SIRC, a single in-point 130 for inboundcommunications to the decision tree computing device 104 and a singleout-point 132 for outbound communications from the decision treecomputing device are provided for programs such as software program 109.While SIRC is a software construct, it enables communications withhardware resources such as device 104 over a hardware communications busof communications interface 110. It will be appreciated that SIRC ismerely one example technology that may be used as a host machineinterface. Many other alternative implementations of host machineinterfaces will be apparent to one of ordinary skill in the art.

The decision tree computation device 104 is configured to process thedata it receives using hardware-based decision tree evaluationprocesses. In order to increase throughput, decision tree computationdevice 104 may utilize a prefetch logic 116 to implement a “pipelined”execution workflow. For example, prefetch logic 116 may be configured toretrieve information from decision tree database 111 and/or FIFO buffer112. Thus, the on-chip memory of decision tree computation device 104may include a prefetch cache 133, and the decision tree computationdevice may be configured to prefetch decision tree database data fromthe decision tree database 111 in memory 106 of the computing device102, and store it in the prefetch cache 133 of the on-chip memory of thedecision tree computation device 104. Information retrieved fromdatabase 111 (such as a decision tree data structure including nodes,node descriptors, etc.) may be used, for example, to evaluate the dataat a given node via computation logic 118.

In some embodiments, FIFO buffer 112 may be divided into separate FIFOswith prefetch logic 116 in between. Such a configuration may allowprefetch logic 116 to access FIFO buffer 112 without requiringadditional read ports of FIFO buffer 112. Use of multiple FIFO buffersmay further increase the parallelism of decision tree computation device104, since individual FIFOs may be accessed concurrently andindependently during decision tree computation.

Furthermore, sort logic 120 may be provided and configured to sort thecontents of FIFO buffer 112 such that all related data is located inconsecutive FIFO entries. Locating all related entries of FIFO buffer112 contiguously and in predictable locations, may allow the associatedentries to be efficiently read and/or written. Further description ofsuch sorting and processing is provided with reference to FIGS. 2A-4,below. As described above, the result 129 of the computation of thedecision tree by the computation logic 118 acting on the sorted FIFObuffer, which is kept full by the prefetch load logic and prefetchcache, is returned to the software program 109 via the out-point 132 ofthe host machine interface of the communications interface 110.

Computing device 102 may further include a display 108. The softwareprogram 109 is further configured to display output 131 on the display108 associated with the computing device 102, based directly orindirectly on the result 129 received from the decision tree computationdevice 104. For example, output 131 that is displayed on the display 108may be a graphical representation of a user in a virtual interactionspace based on skeletal modeling of the user performed as describedbelow with reference to FIGS. 5-6, and/or may be a graphicalrepresentation of a virtual interaction space that includes elementswhich are responsive to recognized natural input.

Methods for traversing decision trees in hardware will now be describedin greater detail in reference to FIGS. 2A-B. FIG. 2A shows an exampleof a decision tree 200 in accordance with embodiments of the presentdisclosure. Decision tree 200 comprises root node 202, also referred toas an origin node; branch nodes 204 and 206; and leaf nodes 208, 210,212, and 214. As illustrated, all nodes of tree 200 which are not leafnodes are associated with two child nodes. Beginning with the root nodeand traversing down the tree 200 until a leaf node is reached, inputdata is evaluated according to the node descriptor defined for eachnode, in order to determine to which child branch node to proceed. Forexample, an evaluation at 202 may result in continuation to branch node204 or to branch node 206. As one specific illustration, in a computerimaging application evaluation at a branch node may include comparing acharacteristic of a first pixel (subject) to a characteristic of asecond pixel (defined relative to the subject pixel, e.g., the pixel tothe right of the subject pixel) and then comparing that result to thenode descriptor. For example, the depth difference between the twopixels may be ascertained and compared to a depth difference thresholddefined by the current node descriptor. In this manner, edges of objectsmay be detected. As another example, the depth of the subject pixel maybe compared to a maximum depth threshold. In this manner, objects thatare located beyond the threshold may be labeled “background”.

At leaf nodes 208, 210, 212, and 214, data is evaluated and theresulting evaluation ends traversal of the decision tree. Evaluation atleaf nodes (e.g., leaf nodes 208, 210, 212, and 214) may compriseapplying a probability distribution to the data. That is, data reachingthe leaf node is said to belong to a certain class or possess a certaincharacteristic associated with that leaf node, with an assignedprobability for the accuracy of the classification. For each data iteminput, the decision tree may be traversed, and the resulting leaf nodedetermination may be returned to the software program 109, for furtherdownstream processing.

FIG. 2B shows a FIFO buffer 250 during traversal of the decision tree ofFIG. 2A. FIFO buffer 250 comprises a plurality of entries ranging from afirst entry 252 to a last entry 254. Although illustrated as comprising6 entries, it will be understood that FIFO buffer 250 may be of anylength. At 260, FIFO buffer 250 is empty, and therefore first entry 252is designated as the “head”, “hole”, and “tail”. As used herein, “head”refers to the first FIFO entry, “hole” refers to the earliest FIFO entryassociated with the highest-order node, and “tail” refers to the firstavailable FIFO entry.

At 270, FIFO buffer 250 comprises the first 3 pixels of a dataset afterevaluation of node 0 at 202, and a fourth pixel still to be evaluated atnode 0. Each FIFO entry comprises a pixel 272 and a target node 274. Asillustrated, pixels 1, 2 and 3 are now associated with node 2. In otherwords, each of pixels 1, 2, and 3 have been evaluated at node 0 202 suchthat the next evaluation of said pixels will occur at node 2 206. Pixel4, on the other hand, has yet to be evaluated at node 0. Furthermore,the entry of pixel 1 is designated as the “hole” since the entry ofpixel 1 comprises the earliest FIFO entry associated with thehighest-order node (e.g., node 2).

At 280, FIFO buffer 250 comprises the first 4 pixels of a dataset afterevaluation of node 0 at 202. As shown, pixel 4 has been evaluated atnode zero, and is now associated with target node 1 204. As mentionedabove, it may be desirable to sort the entries of FIFO buffer 250 suchthat all related entries (e.g., all entries related with the same targetnode) are located in consecutive FIFO entries. Accordingly, allassociated entries of FIFO buffer 250 may be efficiently read and/orwritten. Due to this behavior, pixel 4, which is associated with node 1,has been inserted at the location identified as the “hole” at 270, butwhich is the head at 280, and pixel 2, which is associated with node 2,has been moved to the location identified as the “tail” at 270, and thetail at 280 is moved one location to the right. Accordingly, pixel 4,which is now associated with node 1, is located before the pixels nowassociated with node 2. A new pixel 5 associated with node 0 is loadedinto the head location.

At 290, FIFO buffer 250 comprises all pixels after evaluation at node 0202 of FIG. 2A. At this point, all entries in FIFO buffer 250 are readyto be evaluated by the following tree level (e.g., node 1 204 or node 2206 of FIG. 2A). Therefore, any future evaluations of FIFO buffer 250will result in each of the pixel entries being associated withhigher-order nodes (e.g., nodes 3-6 of FIG. 2A). Accordingly, the firstavailable FIFO (e.g., illustrated as last entry 254) is marked as boththe “hole” and the “tail”.

Future evaluations of FIFO buffer 250 (e.g., at node 3 208 or node 4 210of FIG. 2A) will therefore result in data being read from an earlierlocation in FIFO buffer 250, evaluated, and then inserted at the “tail”(e.g., after data associated with node 2). Accordingly, in the simplestrealization, FIFO buffer 250 may be sufficiently large to contain allentries for all nodes. However, in some embodiments, FIFO buffer 250 mayutilize additional/different structures (e.g., circular buffer or other“wrap-around” structure) such that all associated data is locatedcontiguously without requiring as large of a FIFO.

FIG. 3A shows a process flow of an embodiment of a method 300 for use indecision tree computation according to one embodiment of the presentinvention. The method includes the following steps 302-306, as well as328 and 330 implemented at a software program 109 executed on aprocessor 105 of computing device 102 using portions of memory 106 andmass storage 107. At 302, the method includes receiving user input froma user input device 124 associated with the computing device 102. At304, the method includes, in response to the user input, performing adecision tree task 121. The performing of the decision tree taskincludes, at 306, sending an instruction from the software program to adecision tree computation device 104 implemented in hardware as a logiccircuit distinct from the processor 105.

The method 300 further includes utilizing a communication interface 110to pass communications back and forth at 308, 316, 323, and 330 betweenthe resources of the host computing device, including software program109 and memory 106, and the decision tree computation device 104, toenable the software program 109 to access the dedicated hardware forprocessing of the decision tree computation. Thus, at 308, theinstruction for decision tree computation is passed via thecommunication interface 110 to the decision tree computation device 104.The communications interface 110 typically includes an API and ahardware communications interface over which the API is configured tocommunicate with decision tree computation device 104.

The method 300 further includes performing several processing steps atthe decision tree computation device 104, in hardware. As discussedabove, the hardware in which the decision tree computation device 104 istypically implemented as an integrated circuit formed separate from theprocessor on which the software program 109 is executed. The integratedcircuit includes on-chip memory formed separately from the generalpurpose memory 106 of the processor 105 on which the software program109 is executed. Various hardware devices, such as a FPGA or other PLD,ASIC, or SoC, etc. as described above, may be used for the decision treecomputation device 104.

At 310, the method includes receiving the instruction from the softwareprogram 109, and at 312, performing a decision tree computation.Performing a decision tree computation at the decision tree computationdevice 104 may include various substeps. For example, at 314, the methodmay include prefetching decision tree database data from the memory 106associated with the processor 105 of the computing device 102, thememory 106 having the decision tree database data pre-stored therein, asindicated at 318. At 320, the method may include storing the decisiontree database data in a prefetch cache 133 of the on-chip memory of thedecision tree computation device 104.

At 322, the method may include, in response to receiving the instructionto perform the decision tree computation, loading a FIFO buffer 112 ofthe decision tree computation device 104 with the input data 127 (e.g.,input image data such as pixels) located in memory 106 of the computingdevice 102, as indicated at 321, for subsequent access and processing byon-chip computation logic 118. As an alternative, or in some cases inconjunction with the FIFO processing at 322, a binning process may beutilized, as indicated at 324. In such case, the method may include, at324, in response to receiving the instruction to perform the decisiontree computation, binning decision tree data loaded from the memory ofthe computing device 102 into an off-chip memory location in the memory106 of the computing device 102 during processing of the decision treedata by the on-chip computation logic 118, for subsequent retrieval andprocessing.

At 328, the method includes returning the result 129 to the softwareprogram 109, via the communications interface 110, which passes theresult to the software program at 330. At 332, the result is received bythe software program 109. Finally, at 334, the method includesdisplaying output 131 on a display 108 associated with the computingdevice 102, based directly or indirectly on the result received from thedecision tree computation device 104.

FIG. 3B shows a process flow of an embodiment of a method 350 fortraversing decision trees in accordance with an embodiment of thepresent disclosure. At 352, method 350 comprises recognizing a datasetcomprising a plurality of elements of input data 354. Input data 354 mayinclude, for example, pixels of an image received from a camera (e.g.,camera 128 of FIG. 1). In other embodiments, as described in referenceto FIG. 1, input data 354 may be received via a computing device (e.g.,computing device 102 of FIG. 1) connected to a camera (e.g., camera 128of FIG. 1). It will be understood that, although not shown, method 350may further comprise additional processing of input data 354 after beingrecognized at 354. Furthermore, in some embodiments, input data 354 maybe processed by a separate device (e.g., input device 124 and/orcomputing device 102 of FIG. 1) before being recognized. Such processing(e.g., image segmentation) may identify a subset of input data 354 onwhich the remaining elements of method 350 are performed, thuspotentially reducing time, power consumption, and/or complexity.

At 356, method 350 further comprises loading a data structure (e.g.,FIFO buffer 250 of FIG. 2B). The data structure may include a pluralityof entries 358, each comprising input data 354 received at 352 and atarget node 360. Target node 360 defines the node at which theassociated input data 354 is to be evaluated. Accordingly, duringloading of the FIFO, all entries 358 are associated with the root node.The data structure may be configured such that all data associated witha given node is located in consecutive memory locations in order tofacilitate efficient access of said data. It will be understood that thedata structure may not be sorted after the initial load at 356 since allentries 358 are associated with the root node, and because theevaluation of data at a given node is order-independent (e.g., pixels Aand B associated with the same node can be evaluated in any order).

At 352, method 350 comprises determining if all nodes have beenevaluated. If all nodes have been evaluated (e.g., the entire decisiontree has been traversed and all data has been evaluated at leaf nodes),method 350 ends. If, however, all nodes have not been evaluated, method350 moves to 364.

At 364, method 350 comprises retrieving the node descriptor for thecurrent node. The node descriptor may be retrieved, for example, from anexternal memory system (e.g., decision tree database 111 in memory 106of FIG. 1). In other embodiments, the node descriptor may be retrievedfrom local (e.g., on-chip) storage. If the current node is a branchnode, the descriptor may comprise data to which input data 354 iscompared in order to determine a branching direction. If the currentnode is a leaf node, the descriptor may comprise data used to categorizethe input data 354 (e.g., a class or characteristic, and an associatedprobability as described above).

It will be understood that, in accordance with an embodiment of thepresent disclosure, the node descriptor is retrieved once for each nodeand utilized for all entries 358 defining the current node as the targetnode 360. Accordingly, the descriptor data is only read once, and saidread only occurs when the descriptor data is needed. Such a feature maybe realized due to the sorting schema of the lookup structure (e.g., allassociated input data 354 is located in contiguous locations). It willbe further understood that the descriptor data is defined throughmodeling based on test data sets.

At 366, method 350 comprises determining if a particular entry isrelated to the current node. If the particular entry is not related tothe current node, method 350 moves to 368. If, however, the entry isrelated to the current node, method 350 continues to 370 where the entryis evaluated at the current node. Evaluation may comprise, for example,comparing an input datum (e.g., a pixel) to a node descriptor (e.g., amaximum depth threshold). Various other node descriptors may beutilized, such as the relative contrast threshold for two pixels,relative depth threshold between two pixels, etc.

If the current node is a branch node, method 350 further comprises, at372, defining a new target node. The target node defines the node atwhich the current entry will next be evaluated. If the current node isinstead a leaf node, method 350 skips 372 and instead proceeds directlyto 368. The decision procedure for a binary tree selects either theleft-child or the right-child node.

At 372, method 350 comprises determining if all entries associated withthe current node have been evaluated. If all entries have not beenevaluated for the current node, method 350 continues to 374. At 374,method 350 moves to the next entry and method 350 continues evaluatingthe remaining entries at 366. If, however, all entries associated withthe current node have been evaluated, method 350 continues to 376. At376, method 350 comprises proceeding to the next node. In other words,all data is evaluated at a given node, and each node is evaluated inorder (e.g., node 1 is evaluated after node 0 and node 2 is evaluatedafter node 1, etc.) until all data has been evaluated at a leaf node.

In addition to the sorting FIFO configuration described above, thetraversal of decision trees may utilize other approaches, which will nowbe described in greater detail. In some embodiments, the traversal ofthe decision tree may not include evaluation at the leaf nodes. In suchan embodiment, the dataset is iterated through the decision tree untilreaching a leaf node. At this point, each datum is stored to memoryalong with an identifier (e.g., index) of its associated leaf node. Asoftware layer such as software program 109 may then be configured toread the stored information and to apply the leaf node descriptor (e.g.,probability distribution) accordingly. Such an embodiment may beutilized to reduce the amount of memory needed to store the descriptordata, or, if the amount of memory is constrained, to increase the memoryheadroom in anticipation of future algorithm updates.

In some embodiments, the decision tree (e.g., decision tree 250 of FIG.2B) may comprise one root tree and one or more subtrees. Subtrees may beutilized, for example, due to memory constraints preventing the entiretree from residing in low-latency memory of the decision treecomputation device 104. Each leaf node of the root tree may therefore beassociated with the root node of one of the subtrees. Each such roottree leaf node will be referred to as “sub-leaf”. Accordingly, dataarriving at a sub-leaf node may be passed to the root node of theassociated subtree. By dividing the tree into individual portions, eachportion of the tree may be loaded from a higher-latency memory locationto a lower-latency memory location, traversed, and then overwritten bythe next tree portion. This may reduce the amount of low-latency memoryneeded, and thus possibly the component cost.

In some embodiments, it may be desirable to minimize the amount ofon-chip RAM of the decision tree hardware (e.g., an FPGA) in order toreduce overall cost. Accordingly, a technique known as “binning” may beemployed. In contrast to the sorting FIFO described above, binningrelies on a plurality of locations (“bins”) into which associated datacan be inserted (e.g., sorted). Bins may comprise memory locationsand/or data structures (e.g., buffers). The decision tree computationdevice 104 may be configured to perform binning of decision tree datapreviously loaded into on-chip memory from the memory 106 of thecomputing device 102, back into an off-chip memory location in thememory 106 of the computing device, in a binned manner, to make saidoffloaded data available for later retrieval.

Two embodiments utilizing “binning” will now be described in moredetail. Both embodiments further utilize subtrees as described above.FIG. 4A shows an example dataflow 400 for traversing decision treesutilizing binning in accordance with an embodiment of the presentdisclosure. Dataflow 400 comprises a frame buffer 402 configured tobuffer a plurality of pixels 404. Buffer 402 may be an element of aninput imaging device (e.g., input device 124 of FIG. 1). In otherembodiments, frame buffer 402 may be located within memory, eithervolatile or non-volatile, of a decision tree computation device (e.g.,decision tree computation device 104 of FIG. 1). Pixels 404 aretransferred from buffer 402 to on-chip buffer 406. It will be understoodthat “on-chip” refers to a location within the device packaging of thechip or chips that house the decision tree computation device 104.

Pixels 404 are then iterated through root tree 408, and thus reach oneof sub-leafs 410 of root tree 408. Each pixel of pixels 404 is thenwritten to on-chip buffer 412 along with any associated data resultingfrom traversal of tree 408 (e.g., one or more data identifying anassociated sub-leaf 410). The contents of buffer 412 are then written toone of bins 414, which have been described above. In some embodiments,there may be individual buffers for each sub-leaf 410, and therefore noadditional data beyond pixels 404 may be written to the buffers. In someembodiments, such as where a single buffer is used, buffers 406, 412,416, and 422 may comprise a shared buffer and/or set of shared buffers.

Since all data written to the same location (e.g., one of bins 414) isassociated with the same subtree and all data begins traversal of theassociated subtree at the root node, the data stored in bins 414 doesnot include “next node” descriptors. Furthermore, pixels 404 may notneed to be written, but rather an identifier of each of pixels 404 maybe written. This identifier may then be used by one of the subtrees toaccess the associated data (e.g., one of pixels 404 stored in framebuffer 402). Such a configuration may reduce the amount of data to bewritten, thus increasing throughput.

Continuing with dataflow 400, pixels 404, and/or associated data asdescribed above, are loaded into on-chip buffer 416. Pixels 404 are theniterated through one of subtrees 418, and thus reach one of leafs 420 ofone of subtrees 418. Each pixel of pixels 404 is then written to on-chipbuffer 422 along with any associated data resulting from traversal oftree 418 (e.g., one or more data identifying an associated leaf 420). Insome embodiments, there may be individual buffers for each leaf 420, andtherefore no additional data beyond pixels 404 may be written to thebuffers. Dataflow 400 further comprises the contents of buffer 422 beingwritten to storage 424 (e.g., higher-latency memory).

Since bins 414 may be simple memory locations that cannot be directlyqueried regarding how much data has been written, each bin 414 may beassociated with a counter, hardware-based and/or software-based,configured to keep track of the amount of data written to bins 414. Oncethe root tree 408 has been traversed for all data and data has beenstored in bins 414, each of the counters may be queried to determine ifany data is associated with subtrees 418. If a given counter is non-zero(i.e., there is data associated with the subtree), data is then loadedinto buffer 416 and iterated through one of subtrees 418.

After dataflow 400, the data in storage 424 may then be accessed (e.g.,via a software layer) in order to classify said data (e.g., viaapplication of a probability distribution). In other embodiments, thedata of storage 424 may comprise one or more characterizing data (e.g.,leaf identifier) based on the traversal of tree 408 and one of trees418.

In addition to decreasing the amount of on-chip RAM needed, dataflow 400may further increase read throughput due to the read-backs beingconsecutively spaced. However, such a configuration may also providedrawbacks. For example, root tree 408 and subtrees 418 are retrieved inits entirety before being traversed even if all nodes of the subtree arenot going to be used. This contrasts with the sorting FIFOimplementation which is configured to retrieve individual nodedescriptors as needed. Furthermore, each counter for each bin mayrequire a dedicated memory location to store the counter value and/orassociated counter hardware (if the counter is not software-based), thusincreasing resource utilization.

In a second embodiment utilizing binning, a set of linked listscomprising one or more buffers and associated counters may be employed.Like the first binning embodiment, each datum traverses the root treeuntil reaching a sub-leaf. However, instead of writing the datum to amemory location associated with a particular subtree, one or more linkedlists are queried for a buffer associated with subtree. If such a bufferexists, the datum is added into said buffer and an associated counter isupdated. If no linked buffer has been instantiated yet for a givensubtree, a new buffer is first allocated, added to the linked lists, andassociated with the subtree. This ensures that no empty linked lists areallocated.

When a buffer is filled and is about to overflow, the contents are thenwritten to memory along with a pointer identifying the next buffer ofthe same subtree, if applicable. In some embodiments, the same bufferfor a given sub-tree can be re-used after writing the previous contentsto memory. Due the finite size of the buffers, memory may bepre-allocated at the same time a buffer is allocated to ensure anavailable memory location exists in the event of an overflow.

Once the root tree is traversed for all data, all remaining buffers(i.e., buffers comprising data not previously written to memory) arewritten to memory. Since the remaining buffers are not completelyfilled, one or more data elements characterizing the occupancy of eachbuffer is written along with data itself. This data may include the sizeof the buffer and/or a single bit identifying whether the buffer is fullor partial. This data may be written to the same field used to the storethe next buffer location.

By utilizing buffers instead of memory locations as in the first binningembodiment, the second binning embodiment may utilize a smaller amountof memory, thus possibly reducing costs. However, the increased latencyresulting from memory writes during buffer overflows and from queryingthe linked lists may in turn decrease overall throughput in someapplications.

FIG. 4B illustrates another method 450 for traversal of a decision treein hardware. The method 450 illustrated in FIG. 4B minimizes fetches tothe node database by reordering the input data (e.g., pixels) to begrouped by node descriptor, at each level of the decision tree. Such amethod is useful when the depth map and intermediate databases are kepton-chip and the node database is off-chip, which slows access times tothe node database.

In method 450, the input image data (e.g., pixels) are sorted into alinear list based on the node address. All pixels that visit the samenode appear grouped together in the sorted list. The list is thenprocessed in the sorted order so that a node descriptor can be fetchedonce for all pixels at a given node. An example algorithm to accomplishthis follows.

First, a current list is initialized that contains the followingelements for every pixel to be classified: (1) X, Y coordinates of thepixel in the depth map, and (2) the node address of the pixel, which isset to ROOT to indicate the root of the tree. It will be appreciatedthat the X, Y address for each pixel is not illustrated in FIG. 4B;however, the node addresses are illustrated using the nomenclature L,LR, LRL, etc.

Once the current list initialized, for each level of the tree, thefollowing steps are performed. A next list is initialized which is thesame length as the current list, a left pointer is initialized to pointto the beginning of the next list, and a right pointer is initialized topoint to an end of the next list. For each pixel in the current list,the algorithm then determines whether the left or right branch of thetree is to be taken, and outputs a classification result, an example ofwhich follows.

classification_result=classify_function(pixel_node_addr, pixel_x_y)

The algorithm then computes a new node address for the pixel from theleft/right classification result and the current node address, anexample of which follows.

new_node_addr=compute_node(pixel_node_addr, classification_result)

Next, the pixel is written into the next list, as follows.

if (classification_result==left) {*left_ptr++=(pixel_x_y,new_node_addr)}

if (classification_result==right) {*right_ptr−−=(pixel_x_y,new_node_addr)}

Once all pixels are written into the next list, then the next list canbe set to the current list and the process repeated for the next levelof the tree, until no levels remain to be evaluated.

FIG. 4B illustrates how nodes with the same node address stay groupedtogether in the FIFO buffers. In FIG. 4B decision tree 452 includes aplurality of levels, each having node descriptors 454. Current list FIFObuffer and next list FIFO buffers are illustrated at 456, which theunderstanding that “current” and “next” are relative terms whichindicate a current level of the tree being evaluated and a next level ofthe tree to be evaluated. Thus, when evaluating level 1, current listFIFO buffer is 456A and the next list FIFO buffer is 456B. Nodeaddresses are identified by the prior branch decisions in the tree. Forexample, a pixel with a node address of RRL took the right branch at theroot of the tree, the right branch at level 1 and the left branch atlevel 2.

At the first level of the tree all pixels are read from the depth map.If a pixel is valid it can be classified. Pixels which take the leftbranch are written starting at the left side (or bottom) of the FIFObuffer. Similarly, pixels which take the right branch are groupedtogether as they are written starting at the right side (or top) of theFIFO buffer.

At level 1 the FIFO buffer is read from left side of the FIFO buffer,classified, and then written to the left or right side of a second FIFObuffer (the next list). Thus, all pixels with a node address of ‘L’ areprocessed together. These are then sorted by writing pixels which branchto the left into the left side and those which branch to the right infrom the right side of the FIFO buffer. Eventually, the pixels with anode address of ‘R’ are classified. Once this is done the pixels withthe same node address will be grouped together in the second FIFObuffer, which corresponds to the next list described above.

At this stage, it will be appreciated that the memory which stores the‘L’ and ‘R’ pixels is now free and can be used to store the results ofthe next level of classification which will produce pixels with nodeaddresses ‘LLL’, ‘RLL’, ‘RRL’, ‘LRL’, ‘LRR’, ‘RRR’, ‘RLR’ and ‘LLR’.

Several optimizations can be made to the system 100 to improve theperformance and minimize power consumption using of the embodiment ofFIG. 4B. First, the decision tree can be organized to reflect the FIFOorder to maximize the same page address to memory (e.g., SDRAM). It willbe appreciated that the decision tree is typically large and thusnormally is stored in DDRx memory. This memory is more efficient interms of bandwidth and power when successive accesses stay within thesame page, typically defined as the same 2K block of memory. By orderingthe node descriptors of the decision tree in the same way as the pixelsin the FIFO buffers, this will maximize the number of “same page”accesses.

In addition, linked lists may be used to minimize memory consumption.Blocks of memory in the current list described above, are freed up asthe FIFO buffer is processed. Thus, pointers to these freed up blockscan populate a buffer available list. The next list can then consume thebuffers listed in the buffers available list. As a result, memoryconsumption for the FIFO buffers may be reduced by approximately half.

Further, instead of filling the lists from each end as described above,a separate FIFO buffer may be used for each node address. This may beaccomplished without consuming significant blocks of memory by utilizinga list of pointers to free blocks of memory. The computation logic maythen classify a pixel via node descriptors at two or more levels of thedecision tree in the same round of processing.

With a design that enables traversal of more than one level of the treefor each pixel, the computation logic can read the pixel FIFO bufferonce and store it in a local (on-chip) register or cache for the time ittakes to classify the pixel at two levels (including all reads). Becausethe FIFO entries can be large, this can significantly reduce thebandwidth required to access on-chip memory.

As another modification, the node addresses for nodes at the top of thedecision tree can be made smaller. Using a different data structure atthe top of the decision tree which takes advantage of these smalleraddresses will reduce the bandwidth required for FIFO accesses.

Further, it is possible to minimize internal memory usage by combiningdepth first and breadth first tree traversal. For example, suppose atlevel 1 the pixels with a node address of ‘LL’ are written to a FIFObuffer on chip and pixels with node addresses of ‘RL’, ‘LR’ and ‘RR’ arewritten off chip. This will cut the amount of on-chip memory utilized toabout one fourth if the pixels are evenly distributed. The ‘LL’ pixelscan then be fully classified down to the bottom levels of the tree.Subsequently, the ‘RL’ pixels can be moved on chip and classified downto the bottom level of the tree.

It will be appreciate the system of FIG. 1 may be configured toimplement the method of FIG. 4B. Accordingly, computing device 102 maybe configured to perform the decision tree computation in response tothe decision tree task instruction, by retrieving a plurality of itemsof input data 127 from memory of the computing device, retrievingdecision tree database data 111 including a decision tree having aplurality of node descriptors from the memory 106 of the computingdevice 102, initializing a current list containing the plurality ofitems input data 127, evaluating each item of input data 127 based onnode descriptors at a current level in the decision tree, and reorderingthe evaluated input data to be grouped by node descriptor in a next listto be evaluated according to the node descriptors at a next level of thedecision tree. In this manner, accesses from decision tree computationdevice 104 to off chip memory 106 may be minimized, saving bandwidth andreducing processing times.

FIG. 5 illustrates an exemplary embodiment of a system 500 in which thecomputing device 102 of system 100 described above is configured as agame console 502. The game console 502 has an associated display 508 anda user input device 524 equipped with a depth camera 528. In theillustrated embodiment the user input device 524 is formed in a separatehousing and linked to the game console, for example, by a wired orwireless connection such as a USB connection. The depth camera isconfigured to capture a depth image of a real 3D interaction space 4 inwhich a user 5 may interact with the game console 502 by performingthree dimensional movement of the user's body. Since the user maycontrol the game console 502 directly with body movements, such userinput is referred to as natural user input. It will be appreciated thatthe features of game console 502 are similar to those of computingdevice 102 described above, and therefore a detailed description of suchsimilar features will be omitted for the sake of brevity.

FIG. 6 shows a simplified processing pipeline 550 in which user 5 in the3D interaction space 4 is modeled as a virtual skeleton 568 that canserve as a control input for controlling various aspects of a softwareprogram such as game program 509 executed on processor 505 of gameconsole 502 of system 500. FIG. 6 shows five stages of the processingpipeline 550: image collection 552 from depth camera 528, motionestimation 554, body part recognition 562, skeletal modeling 566, andgame output 531 to display 508. It will be appreciated while much ofthis processing pipeline 550 is executed by game program 509 onprocessor 505 of the gaming console 502, a portion of this processingpipeline 550 may be implemented at the decision tree computation device504 of the game console 502. In the illustrated embodiment, body partrecognition 562 is accomplished by decision tree analysis that occurs atthe decision tree computation device 504, while image collection 552,motion estimation 554 and skeletal modeling 566 occur at game program509 on processor 505. It will be appreciated that the processingpipeline 550 may include additional steps and/or alternative steps thanthose depicted in FIG. 6 without departing from the scope of thisdisclosure. The processing pipeline 550 will now be described in greaterdetail below.

During image collection 552, user 5 and the rest of the 3D interactionspace 4 may be imaged by a capture device such as depth camera 528. Itwill be appreciated that multiple users may be imaged concurrently inthe 3D interaction space. The depth camera is used to track a positionof each of a plurality of joints of a user (e.g., user 5). During imagecollection 552, the depth camera may determine, for each pixel, thedepth of a surface in the observed scene relative to the depth camera.Virtually any depth finding technology may be used without departingfrom the scope of this disclosure. Example depth finding technologiesare described below.

During motion estimation 554, the depth information determined for eachpixel may be used to generate a depth map 556. It will be appreciatedthat each pixel includes associated depth information indicating ameasured camera-to-imaged-object depth. During motion estimation, eachpixel in the depth images may be tagged with player information,enabling only those pixels for each player to be identified to produce aplayer-specific depth map 556, if desired. This information may bestored in a depth image buffer or other data structure that includes adepth value for each pixel of the observed scene, and also includestagged player information. In FIG. 6, depth map 556 is schematicallyillustrated as a pixelated grid of the silhouette of user 5. Thisillustration is for simplicity of understanding, not technical accuracy.It is to be understood that a depth map generally includes depthinformation for all pixels, not just pixels that image the user 5. Depthmapping is typically performed by game program 509 executed by theprocessor 505 of the game console; however, in some embodimentsspecially configured processors (e.g., ASIC, PLD, SoC, etc.) may beincluded within the user input device 524 for this purpose.

The depth map 556 with player information is passed to game program 509,for body part recognition 562 of the specific body parts of each playerwithin the depth map. The body part recognition 562 is accomplished bydecision tree analysis using decision tree database data stored inmemory 106 of the game console 502. In the illustrated embodiment thegame program 509 sends an instruction to the decision tree computationdevice 504 to perform a decision tree computation for the purposes ofbody part recognition 562, and the decision tree computation devicecomputes the decision tree computation according to the methodsdescribed above in relation to FIGS. 1-4. The result of the body partrecognition is a depth map in which the pixels have further been taggedwith body part recognition data, such as “head,” “right arm,” etc. Thisresult of the decision tree computation is returned to the game program509, which in turn performs downstream processing thereon. It will beappreciated that the instruction to perform the decision treecomputation may be explicit or implicit. Thus, the body part recognitionmay be performed programmatically by the decision tree computationdevice upon detecting that data is available for analysis without anexplicit message or instruction to do so from the game program 509.

During skeletal modeling 566, skeletal fitting algorithms are applied tothe result of the body part recognition 562 received from the decisiontree computation device 504, namely, the body-part-recognized depth map556. In this manner a virtual skeleton 568 may be generated that fits tothe labeled body parts of the depth map 556. The virtual skeleton 568provides a machine readable representation of user 5 as observed bydepth camera 528. The virtual skeleton 568 may include a plurality ofjoints, each joint corresponding to a portion of the user 5. Virtualskeletons 568 in accordance with the present disclosure may includevirtually any number of joints, each of which can be associated withvirtually any number of parameters (e.g., three dimensional jointposition, joint rotation, body posture of corresponding body part (e.g.,hand open, hand closed, etc.) etc.). It is to be understood that avirtual skeleton 568 may take the form of a data structure including oneor more parameters for each of a plurality of skeletal joints (e.g., ajoint matrix including an x position, a y position, a z position, and arotation for each joint). In some embodiments, other types of virtualskeletons may be used (e.g., a wireframe, a set of shape primitives,etc.). The position and motion of each joint in the virtual skeleton 568within the interaction space 4 may be tracked over time by a skeletaltracking algorithm.

The game program 509 is configured display on the display 508 gameoutput 531 including a graphical representation of the user 5 in thevirtual interaction space 4 based on virtual skeleton 568 generated byskeletal modeling 566 of the user. In other embodiments, the game output531 may include a graphical representation of a virtual interactionspace that includes elements which are responsive to recognized naturalinput may be depicted. In this manner, a user may interact with avirtual GUI using recognized gestures for commands such as pressingbuttons, manipulating sliders, pausing the game, etc.

Turning now to the computing environment in which the embodiments of theinvention may be practiced, it will be appreciated that the computingdevice 102 of FIG. 1 and the game console of FIG. 5 are non-limitingexamples of devices on which the embodiments may be practiced. Thecomputing device 102 may take the form of a mainframe computer, servercomputer, desktop computer, laptop computer, tablet computer, homeentertainment computer, network computing device, mobile computingdevice, mobile communication device, gaming device, etc. The computingdevice 102 may also optionally include user input devices such askeyboards, mice, game controllers, microphones, and/or touch screens,for example.

Processor 105 of computing device 102 may be, for example, a single coreor multicore processor, and the programs that run on processor 105 maybe configured for parallel processing. The processor 105 may also bedistributed amongst several different computing machines, fordistributed processing. Further, one or more aspects of the processor105 may be virtualized and executed by remotely accessible networkedcomputing devices configured in a cloud computing configuration.

Memory 106 is typically one or more physical devices configured asvolatile memory to temporarily store data for a period of time in anon-transitory manner. As one example RAM may be used as memory 106.Mass storage 107 typically includes one or more physical devicesconfigured as non-volatile memory to store data in a non-transitorymanner, e.g., even when power is cut to the device. Examples of suitablemass storage devices include a magnetic storage device such as a harddrive, a non-volatile semiconductor storage device such as FLASH memory,and an optical storage device such as DVD-R. Typically, programs such assoftware program 109 are stored in a non-volatile manner in mass storagefor execution on processor 105 using portions of memory 106, toimplement the methods and processes described herein. In someembodiments, processor 105 and memory 106 may be integrated into one ormore common devices, such as an ASIC or a SoC.

To provide for distribution of the software programs described herein,the system 100 may further include removable computer-readable storagemedia, which may be used to store and/or transfer data and/orinstructions executable to implement the herein described methods andprocesses. Corresponding drives to read the computer-readable storagemedia may be provided on the computing device 102. The removablecomputer-readable storage media may take the form of CDs, DVDs, HD-DVDs,Blu-Ray Discs, EEPROMs, and/or floppy disks, among others.

It is to be appreciated that memory 106 and mass storage 107 includesone or more physical, non-transitory devices. In contrast, in someembodiments aspects of the instructions described herein may bepropagated in a transitory fashion through a transmission medium by apure signal (e.g., an electromagnetic signal, an optical signal, etc.)that is not held by a physical device for at least a finite duration.Furthermore, data and/or other forms of information pertaining to thepresent disclosure may be propagated by a pure signal.

The terms “program” refers to software components of system 100 that areimplemented to perform one or more particular functions. In some cases,such a program may be instantiated via processor 72 executinginstructions held by memory 74. For ease of explanation, the diagramsmay illustrate such software programs stored on mass storage andinteracting with other hardware and software components; however, itwill be appreciated that instantiated threads of the programs executedby the processor actually perform such actions. The term “program” ismeant to encompass individual or groups of executable files, data files,libraries, drivers, scripts, database records, etc. The computing devicemay be configured to run an operating system, which enables access ofthe software programs to various hardware resources, for example viasoftware constructs referred to as application program interfaces.

The depth camera 128 of user input device 124 may include left and rightcameras of a stereoscopic vision system, for example. Time-resolvedimages from both cameras may be registered to each other and combined toyield depth-resolved video. In other embodiments, depth camera 128 maybe a structured light depth camera configured to project a structuredinfrared illumination comprising numerous, discrete features (e.g.,lines or dots). Depth camera 128 may be configured to image thestructured illumination reflected from a scene onto which the structuredillumination is projected. Based on the spacing between adjacentfeatures in the various regions of the imaged scene, a depth image ofthe scene may be constructed.

In other embodiments, depth camera 128 may be a time-of-flight cameraconfigured to project a pulsed infrared illumination onto the scene. Thedepth camera may include two cameras configured to detect the pulsedillumination reflected from the scene. Both cameras may include anelectronic shutter synchronized to the pulsed illumination, but theintegration times for the cameras may differ, such that a pixel-resolvedtime-of-flight of the pulsed illumination, from the source to the sceneand then to the cameras, is discernible from the relative amounts oflight received in corresponding pixels of the two cameras.

In some embodiments, the user input device 124 may further include avisible light camera. Virtually any type of digital camera technologymay be used without departing from the scope of this disclosure. As anon-limiting example, the visible light camera may include a chargecoupled device image sensor.

Although the software program 109 has been described herein bothgenerically as a software program and specifically as a gaming program,it will be appreciated that the software program 109 may alternativelybe a speech recognition program, search engine program, image processingprogram, or other software program that performs decision treeprocessing, and thus the result of the decision tree computation taskreceived from the decision tree computation device may be used by thesoftware program in speech recognition, image processing, search queryor other types of data processing. It will be appreciated that while thespecific types of input data for these applications may be differentfrom the depth image data of the embodiment of FIGS. 5 and 6, and thedetailed design of the decision trees in these applications may also bedifferent, the systems and methods described herein for processing theinput data by evaluating the decision trees at least partially inhardware on decision tree computation device 102 may be effectivelyapplied to decision tree analysis in these varied fields of endeavorwithout significant modification, since the fundamental processes fortraversing the decision trees in each field is similar.

It is to be understood that the configurations and/or approachesdescribed herein are exemplary in nature, and that these specificembodiments or examples are not to be considered in a limiting sense,because numerous variations are possible. The specific routines ormethods described herein may represent one or more of any number ofprocessing strategies. As such, various acts illustrated may beperformed in the sequence illustrated, in other sequences, in parallel,or in some cases omitted. Likewise, the order of the above-describedprocesses may be changed.

The subject matter of the present disclosure includes all novel andnonobvious combinations and subcombinations of the various processes,systems and configurations, and other features, functions, acts, and/orproperties disclosed herein, as well as any and all equivalents thereof.

The invention claimed is:
 1. A computing device for use in decision treecomputation, comprising: a processor; a software program stored in amass storage device and executed by the processor to perform a decisiontree task; and a decision tree computation device implemented inhardware as a logic circuit physically distinct from the processor, andwhich is linked to the processor by a communications interface, thedecision tree computation device being configured to receive aninstruction to perform a decision tree computation associated with thedecision tree task from the software program, process the instruction,retrieve a plurality of items of input data and decision tree databasedata including a decision tree having a plurality of node descriptors,perform the decision tree computation by applying the decision tree tothe plurality of items of input data, and return a result to thesoftware program via the communications interface, wherein the hardwarein which the decision tree computation device is implemented includeson-chip memory and is a field programmable gate array (FPGA), complexprogrammable logic device (CPLD), or application specific integratedcircuit (ASIC), and wherein the computing device further includesoff-chip memory in direct communication with the processor, the off-chipmemory being physically distinct from the on-chip memory of the decisiontree computation device; and wherein, to perform the decision treecomputation, the decision tree computation device is configured toretrieve the plurality of items of input data from the off-chip memoryphysically distinct from the hardware in which the decision treecomputation device is implemented, retrieve decision tree database dataincluding the decision tree having a plurality of node descriptors fromthe off-chip memory, initialize a current list containing the inputdata, evaluate each item of input data based on node descriptors at acurrent level in the decision tree, and reorder the evaluated input datato be grouped by node descriptor in a next list to be evaluatedaccording to the node descriptors at a next level of the decision tree.2. The computing device of claim 1, wherein the on-chip memory includesa prefetch cache, and the decision tree computation device is configuredto prefetch decision tree database data from the off-chip memoryphysically distinct from the on-chip memory of the decision treecomputation device, and store it in the prefetch cache of the on-chipmemory.
 3. The computing device of claim 2, wherein the on-chip memoryfurther includes a FIFO buffer, and the decision tree computation deviceis further configured to, in response to receiving the instruction toperform the decision tree computation, load the FIFO buffer with inputdata in the off-chip memory, for subsequent access and processing by thedecision tree computation device.
 4. The computing device of claim 1,wherein the decision tree computation device is configured to performbinning of decision tree data loaded from off-chip memory physicallydistinct from the hardware in which the decision tree computation deviceis implemented.
 5. The computing device of claim 1, wherein the softwareprogram is further configured to display output on a display associatedwith the computing device, based directly or indirectly on the resultreceived from the decision tree computation device.
 6. The computingdevice of claim 5, wherein the computing device is configured as a gameconsole, the software program is configured to receive user input from adepth camera configured to capture depth image information as the userinput, and the software program is configured to classify the depthimage information via the decision tree computation task, based on theresult received from the decision tree computation device.
 7. Thecomputing device of claim 6, wherein the software program is furtherconfigured to: generate a depth map based on the captured depth imageinformation, the depth map including a depth value for each pixel; tag aplurality of pixels in the depth map with player information for aplayer; and perform a decision tree task on the tagged plurality ofpixels of the depth map for the player.
 8. The computing device of claim1, wherein the result of the decision tree computation task receivedfrom the decision tree computation device is used by the softwareprogram in speech recognition, image processing, or search queryprocessing.
 9. A method for use in decision tree computation,comprising: at a software program stored in a mass storage device andexecuted by a processor: receiving user input from a user input deviceassociated with a computing device; in response to the user input,performing a decision tree task, the performing of the decision treetask including sending an instruction from the software program to adecision tree computation device implemented in hardware as a logiccircuit physically distinct from the processor; at the decision treecomputation device: receiving the instruction from the software program;retrieving a plurality of items of input data and decision tree databasedata including a decision tree having a plurality of node descriptors;performing a decision tree computation by applying the decision tree tothe plurality of items of input data; and returning a result to thesoftware program, wherein the hardware in which the decision treecomputation device is implemented is a field programmable gate array(FPGA), complex programmable logic device (CPLD), or applicationspecific integrated circuit (ASIC), and includes on-chip memoryphysically distinct from off-chip memory that is in direct communicationwith the processor of the computing device; wherein the instruction issent from the software program to the decision tree computation devicevia a hardware communications interface; and wherein performing thedecision tree computation further comprises retrieving the plurality ofitems of input data from the off-chip memory physically distinct fromthe hardware in which the decision tree computation device isimplemented, retrieving decision tree database data including thedecision tree having a plurality of node descriptors from the off-chipmemory, initializing a current list containing the input data,evaluating each item of input data based on node descriptors at acurrent level in the decision tree, and reordering the evaluated inputdata to be grouped by node descriptor in a next list to be evaluatedaccording to the node descriptors at a next level of the decision tree.10. The method of claim 9, wherein performing a decision treecomputation at the decision tree computation device includes:prefetching decision tree database data from the off-chip memoryutilized by the processor of the computing device; and storing thedecision tree database data in a prefetch cache of the on-chip memory ofthe decision tree computing device.
 11. The method of claim 10, whereinperforming the decision tree computation at the decision treecomputation device further includes: in response to receiving theinstruction to perform the decision tree computation, loading a FIFObuffer of the decision tree computation device with the decision treedata from the decision tree database in the off-chip memory forsubsequent access and processing by the decision tree computationdevice.
 12. The method of claim 10, wherein performing the decision treecomputation at the decision tree computation device further includes: inresponse to receiving the instruction to perform the decision treecomputation, binning decision tree data loaded from the off-chip memoryduring processing of the decision tree data by the decision treecomputation device for subsequent retrieval and processing.
 13. Themethod of claim 9, further comprising: displaying output on a displayassociated with the computing device, based directly or indirectly onthe result received from the decision tree computation device.
 14. Acomputing device for use in decision tree computation, comprising: aprocessor; a software program stored in a mass storage device andexecuted by the processor to perform a decision tree task; and adecision tree computation device implemented in hardware as a logiccircuit physically distinct from the processor, and which is linked tothe processor by a communications interface, the decision treecomputation device being configured to receive an instruction to performa decision tree computation associated with the decision tree task fromthe software program, process the instruction, retrieve a plurality ofitems of input data and decision tree database data including a decisiontree having a plurality of node descriptors, perform the decision treecomputation by applying the decision tree to the plurality of items ofinput data, and return a result to the software program via thecommunications interface, wherein the hardware in which the decisiontree computation device is implemented is a system-on-chip, and each ofthe processor and the decision tree computation device are separatelogic units within the system-on-chip, and wherein the computing devicefurther includes off-chip memory in direct communication with theprocessor, the off-chip memory being physically distinct from theon-chip memory of the decision tree computation device; and wherein, toperform the decision tree computation, the decision tree computationdevice is configured to retrieve the plurality of items of input datafrom the off-chip memory physically distinct from the hardware in whichthe decision tree computation device is implemented, retrieve decisiontree database data including the decision tree having a plurality ofnode descriptors from the off-chip memory, initialize a current listcontaining the input data, evaluate each item of input data based onnode descriptors at a current level in the decision tree, and reorderthe evaluated input data to be grouped by node descriptor in a next listto be evaluated according to the node descriptors at a next level of thedecision tree.